Clustering Web Sessions Using Extended General Pages

نویسندگان

  • Zhongming Ma
  • Olivia R. Liu Sheng
چکیده

We study Web sessions clustering in order to find groups of similar sessions and discover user access patterns on a Web site. We extend the general page concept presented in (Fu, Sandhu and Shih 2000) by including partial document names and dynamic pages, and use an extended general page (EGP) to represent many individual page URLs sharing the same EGP. We present two extensions of a hierarchical clustering algorithm, ROCK (Guha, Rastogi and Shim 2000). One is a notion of EGP count that we add to the session similarity calculation. The other is a goodness threshold we adopt to restrict certain clusters from merging with others. Further, we propose a set of measurements for assessing the results from clustering boolean and categorical data and help users to identify their desired clustering results. In our experiments, we applied the ROCK and the extended ROCK (EROCK) algorithms to cluster a half-month’s Web log from a customer service Web site at HP. The experiment results showed that E-ROCK alleviated a large cluster problem of the ROCK algorithm and improved the performance in intra cluster similarity.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clustering of Web Users Based on Access Patterns

The clustering of the Web users based on their access patterns is studied. Access patterns of the Web users are extracted from Web servers' log les, and then organized into sessions which represent episodes of interaction between Web users and the Web server. Using attributed-oriented induction, the sessions are then generalized according to the page hierarchy which organizes pages according to...

متن کامل

Clustering Web Sessions by Sequence Alignment

Clustering means grouping similar objects into groups such that objects within a same group bear similarity to each other while objects in different groups are dissimilar to each other. As an important component of data mining, much research on clustering has been conducted in different disciplines. In the context of web mining, clustering could be used to cluster similar clickstreams to determ...

متن کامل

Use of Semantic Similarity and Web Usage Mining to Alleviate the Drawbacks of User-Based Collaborative Filtering Recommender Systems

  One of the most famous methods for recommendation is user-based Collaborative Filtering (CF). This system compares active user’s items rating with historical rating records of other users to find similar users and recommending items which seems interesting to these similar users and have not been rated by the active user. As a way of computing recommendations, the ultimate goal of the user-ba...

متن کامل

Web sessions clustering for behavioral targeting

In this paper we present our on going e ort to compare web sessions clusters based on di erent web sessions representations. Sessions within the same cluster represent common navigation patterns. We assume that users with the same navigation patterns have common interests and motivations at a point in time. Therefore, we represent sessions based on descriptions extracted from the URLs as well a...

متن کامل

Study and Evaluation of user’s behavior in e-commerce Using Data Mining

Data mining has matured as a field of basic and applied research in computer science. The objective of this dissertation is to evaluate, propose and improve the use of some of the recent approaches, architectures and Web mining techniques (collecting personal information from customers) are the means of utilizing data mining methods to induce and extract useful information from Web information ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004